On Performance and Cache Effects in Substring Indexes
نویسنده
چکیده
This report evaluates the performance of uncompressed and compressed substring indexes on build time, space usage and search performance. It is shown how the structures react to increasing data size, alphabet size and repetitiveness in the data. The main contribution is the strong relationship shown between time performance and locality in the data structures. As an example, it is shown that for a large alphabet, suffix tree construction can be speeded up by a factor 16, and query lookup by a factor 8, if dynamic arrays are used to store the lists of children for each node instead of linked lists, at the cost of using about 20% more space. And for enhanced suffix arrays, query lookup is up to twice as fast if the data structure is stored as an array of structs instead of a set of arrays, at no extra space cost.
منابع مشابه
Buffering Accesses to Memory-Resident Index Structures
Recent studies have shown that cacheconscious indexes outperform conventional main memory indexes. Cache-conscious indexes focus on better utilization of each cache line for improving search performance of a single lookup. None has exploited cache spatial and temporal locality between consecutive lookups. We show that conventional indexes, even “cache-conscious” ones, suffer from significant ca...
متن کاملCache-Conscious Concurrency Control of Main-Memory Indexes on Shared-Memory Multiprocessor Systems
Recent research addressed the importance of optimizing L2 cache utilization in the design of main memory indexes and proposed the so-called cache-conscious indexes such as the CSB+-tree. However, none of these indexes took account of concurrency control, which is crucial for running the real-world main memory database applications involving index updates and taking advantage of the off-the-shel...
متن کاملA Tabu-Based Cache to Improve Range Queries on Prefix Trees
Distributed Hash Tables (DHTs) provide the substrate to build large scale distributed applications over Peerto-Peer networks. A major limitation of DHTs is that they only support exact-match queries. In order to offer range queries over a DHT it is necessary to build additional indexing structures. Prefix-based indexes, such as Prefix Hash Tree (PHT), are interesting approaches for building dis...
متن کاملApproximate String Matching with Lempel-Ziv Compressed Indexes
A compressed full-text self-index for a text T is a data structure requiring reduced space and able of searching for patterns P in T . Furthermore, the structure can reproduce any substring of T , thus it actually replaces T . Despite the explosion of interest on self-indexes in recent years, there has not been much progress on search functionalities beyond the basic exact search. In this paper...
متن کاملReduction in Cache Memory Power Consumption based on Replacement Quantity
Today power consumption is considered to be one of the important issues. Therefore, its reduction plays a considerable role in developing systems. Previous studies have shown that approximately 50% of total power consumption is used in cache memories. There is a direct relationship between power consumption and replacement quantity made in cache. The less the number of replacements is, the less...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007